Learning Concept Hierarchies from Text with a Guided Hierarchical Clustering Algorithm
نویسندگان
چکیده
We present an approach for the automatic induction of concept hierarchies from text collections. We propose a novel guided agglomerative hierarchical clustering algorithm exploiting a hypernym oracle to drive the clustering process. By inherently integrating the hypernym oracle into the clustering algorithm, we overcome two main problems of unsupervised clustering approaches relying on the distributional similarity of terms to induce concept hierarchies. First, by only clustering two terms if they have a hypernym in common we make sure that the cluster produced in this way is actually reasonable. Second, by labeling the clusters with the corresponding hypernym we overcome the labeling problem shared by all unsupervised approaches. We present results of a comparison of our approach with Caraballo’s method, assessing the quality of the automatically learned ontologies by comparing them to a handcrafted taxonomy for the tourism domain using the similarity measures of Maedche et al. Further, we also present a human evaluation of the concept hierarchy produced by our guided algorithm.
منابع مشابه
Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris’ distributional hypothesis and model the context of a certain term as a vector representing syn...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملClustering Concept Hierarchies from Text
Abstract We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. F...
متن کاملApplication of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data
Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کامل